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SUMMARY 

This  Is  the  first  of  a  series  of  papers  devoted  to  the 
computational  solution  of  dynamic  programming  processes.  In 
It  we  use  the  functional-equation  approach  to  treat  a  tactical 
air-warfare  model  that  A.  Mangel  previously  has  considered  by 
means  of  classical  variational  techniques. 
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ON  THE  COMPUTATIONAL  SOLUTION  OP 
DYNAMIC-PROGRAMMING  PROCESSES-I 
ON  A  TACTICAL  AIR-WARFARE  MODEL  OF  MENQEL 

1 .  INTRODUCTION 

This  l8  the  firet  of  a  eerlee  of  papers  devoted  to  the 
computational  solution  of  dynamic— programming  processes. 
Although  the  papers  are  linked  together  by  a  common  method 
each  of  the  diverse  problems  we  shall  treat  possesses  particu¬ 
lar  features  of  interest  and  difficulty  that  make  a  detailed 
exposition  of  the  coding  worthwhile. 

It  is  planned  eventually  to  present  all  the  papers  of  the 
series  in  the  form  of  a  book. 

We  would  like  to  express  our  appreciation  to  E.  W.  Paxson 
for  a  number  of  helpful  comments  and  suggestions  which  we  have 
incorporated  in  the  paper. 


2.  ATTRITION  PROCESSES 

The  study  of  attrition  processes  arl"lng  from  military 
campaigns  leads  to  a  class  of  /arlatlonal  problems  that  are 
particularly  well  suited  to  dynamic  programming. 

Consider  the  following  model.  Let  the  state  of  Blue's 
forces  at  time  t  be  specified  by  the  vector  x,  with  components 
Xj^ jX^ »  •  •  •  and  the  state  of  Red’s  forces  be  specified  by  y, 
with  components  »y2»  •  •  •  At  each  stage  of  the  process, 

which  may  be  discrete  or  continuous — and  this  has  less  to  do 
with  reality  than  with  the  type  of  computing  machine  which  Is 
available,  a  digital  computer  or  an  analog  computer — each  side 
allocates  a  certain  portion  of  the  forces  to  combat,  obtaining 


P-1072 

Revised 

5-2V57 

-2- 

In  this  way  a  certain  payoff  and  suffering,  In  return,  a  certain 
attrition.  Let  z  be  the  allocation  vector  of  Blue  and  w  the 
allocation  vector  of  Red.  The  natural  constraints  are,  in  vector 
form, 

(1)  0  ^  z  ^  X,  0  ^  w  ^  y; 
that  Is, 

(2  J  ^  ^  ^  ^  ^  ^  •  1*1,?,...,M, 

0  ^  i  Vj,  J  -  1,2, ....N. 

The  single-stage  payoff  Is  determined  as  some  function 

(3)  R(x,y,z,w) 

(In  practice,  usually  the  most  difficult  function  to  decide  upon) 
and  we  assume  that  we  know  the  attrition  due  to  combat,  so  that 

(4)  ~  -  F(x,y,z,w),  x(0)  -  q, 

dt  ^ 

^  »  0(x,y,z,w),  y(0)  -  Qp 

dt 

where  q^  and  are  the  Initial  forces. 

The  mathematical  problem  Is  then  that  cf  determining 

T 

(5)  min  max  /^P(x,y ,z,w  )dt, 

w  z 

where  T  Is  the  duration  of  the  process,  subject  to  the  constraints 

(2)  and  the  relations  (4). 

Alternatively,  we  may  wish  to  determine 

(6)  max  min  F (x ,y , z , w )dt . 

z  wo 
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Slnce,  In  general,  the  deterrlnat Ion  of  min  max  or  max  min 
le  a  formidable  problem,  particularly  If  min  max  4  min,  we 
Bhall  reduce  the  magnitude  of  the  problem  by  fixing  Red’c  strategy, 
say  w  -  w»,  and  proceeding  to  determine 

T 

(7)  max  R(x,y,z,w*)dt, 

z  o 


subject  to 


(8)  (a)  0  ^  z  ^  x. 


(b)  —  -  F(x,y,z,w*),  x(0)  -  q,, 

dt  ^ 

^  -  G(x,y,z,w»),  y(0)  -  q^. 
dt 


Problems  of  this  type  are  difficult,  using  conventional 
methods,  because  of  the  presence  of  the  constraints,  and  the 
analytic  structure  of  the  functlonr  F,  G  and  R. 


3.  MENOEL’S  MODEL 

Let  us  now  consider  the  attrition  process  that  has  been 
discussed  by  Arnold  Mengel,  using  classical  variational  tech¬ 
niques  [5j  . 

Considering  only  air  forces  consisting  of  one  type  of  plane, 
for  the  purpose  of  an  exploratory  model,  we  have  the  equations 

-bpS^y/x 

(1)  X  »  G^(x,y,Sp)  -  r^-apX-x(l-e  "  ), 

-b, p,x/y 

y  -  G^(x,y,8^)  -  r^-ap--y(l-e  ), 


(1)  replacement  rates  of  new  aircraft.  The  terms 

»  0  1  y  '^represent  operational  (non— combat)  attrition  rates. 

Tre  term  1®  Red's  counter  air  effort  over  (t,t*-cjt),  with 

"kill  potential"  b^Spy .  The  cr.ance  any  one  of  the  x  aircraft  Is 
killed  by  one  of  the  s^y  attacks  Is  b-,-  The  average  number  of 
attacks  per  Blue  aircraft  Is  n  «  e^y/^.  Hence  tne  probability  of 
survival  Is  ( i-b  )  n  exp  (  L,n). 
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for  fixed  e^,  with  the  payoff  function 


(1) 


(2) 


J(8l) 


)x-(1-62 


dt. 


Here 


(3)  x(t)  ■  the  number  of  Blue  aircraft  at  time  t, 

y(t)  ■  the  number  of  Red  Aircraft  at  time  t. 
The  allocation  varlablen  are 


(4)  6j^(t)  -  fraction  of  Blue  sorties  on  counter— air  strikes, 

e2(t)  -  fraction  of  Red  sorties  on  counter— air  strikes. 
As  mentioned  above,  we  shall  fix  In  this  case  by 

asaum’.ng  various  constant  levels,  s^,  and  then  maximizing  J(Sj) 
over  all  s^(t)  satisfying  0  <  8j^(t)  ^  1. 

4.  DYNAMIC-PROORAMMINO  APPROACH — I 
Setting 


(1) 

max  J(si)  ■  f(qi 

1 ^  * 

«1 

we  obtain 

,  as  In 

[l],  the 

nonlinear  partial 

differential 

equation 

(2) 

£f  ^ 
dT 

max 

( 1~8  ‘ 

v^f  1 

with 

(3) 

f (Qi »Qp  »  5)  =  0 . 

This  equation  may  be  solved  numerically  using  approximating 
difference  equations  In  the  usual  fashion.  In  practice,  we  en— 

(l)  This  measures  the  total  excess  of  Blue's  combat 
capability  over  Red's  during  the  campaign  on  missions  other 
than  counter— air. 
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countered  a  great  deal  of  difficulty  with  this  method  due  to 
Instability  arising  from  transition  curves.  Consequently,  we 
went  over  to  the  method  we  shall  present  In  the  next  section. 

This  method  has  applications  to  the  numerical  Integration  of  other 
types  of  partial  differential  equations,  a  matter  which  we  have 
discussed  elsewhere. 

5.  DYNAMIC-PROORAMMINO  APPROACH — II 

Let  us  consider  the  following  discrete  process.  Divide  the 
Interval  [o,t]  Into  N  equal  parts  of  length  A, 

t .  T  -  T  '  '  f  —  "  ~  '  ■  "  < 

0  A  221  kA  NA  -  T 

Let  us  assume  that  decisions  may  be  made  only  at  times 
IcA,  k  •  0, 1 ,2 ,  .  .  .  ,N-1 .  As  far  as  actual  proces.  e  are  concerned, 
this  may  be  a  more  realistic  assumption  than  that  of  a  continuous 

process . 

In  place  of  the  equations  (b)  of  Section  2,  we  have  the 
difference  equations,  or  recurrence  relations. 


(1) 

’‘k+i  -  *k  ^ 

A, 

"'o  -  '^l' 

^krl  "  ^'k  ^ 

A, 

^0  •  ^2' 

where 

(2) 

-  x(kA) ,  y^^  -  y  (kA) , 

®lk  " 

The 

sequence  chosen 

to  maximize 

N-1 

(3)  J(6,)  -  2 

^  k«0 

subject  to  the  restriction  0 

Let 


Ik^^k  ®2k 
<-®lk^  1- 


(4)  max  J(8^)  .  fv;(qi,q2)» 


with 

(5)  -  (^~®2k^^2‘ 
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The  basic  recurrence  relation  used  to  compute  the  sequence 

(6)  (l-e^)q^  - 

1 

f  or  N  1 , 2  ,  .  .  .  . 


6 .  TIME-DEPENDENT  PROCESSES 

Let  us  consider  a  process  of  the  same  general  type  in  which 
the  attrition  and  payoff  functions  depend  upon  time.  Thus 


(1) 

—  -  C,(x,y,B  ,t), 
dt  ^  ^ 

x(0) 

~  -  02(x,y,s^,t) , 
d  t 

y(o) 

and  we  wish  to  choose  s^^  so  as  to  maximize 


(2) 


J(8^)  -  y^P(x,y.s^  ,t  )dt. 


In  this  case,  we  keep  the  terminal  point  T  fixed  and  describe 
the  state  of  the  process  by  means  of  the  resources  and  the  starting 


point . 
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The  discrete  maximization  problem  Is  then:  Maximize 


(3) 


N 


SMbJect  to  the  conetralnte 


(M 


*k+l  “  ^k  '*■  °l^^k*^k’®2k*^^ »  ”  ^1’ 


^k+1  “  ^k  ^2^^k’^k’®lk*^^ »  "  ^2* 

Setting 

(5)  max  JR(fli)  - 

we  obtain  the  recurrence  relations 


(6) 


^r(^1 »^2  ^ 


max  I P(qi  ,qp»s,  ,R) 


for  R  -  0, 1 ,2, . . . ,N— 1 ,  with 


(7) 


^^(qwqp)  -  max 
O^x^^l 


p(qj^»f2'®i'^^’ 


7.  DISCUSSION  OF  COMPUTATIONAL  PROCEDURES 

The  recursive  nature  or  the  protlem  makes  it  particularly 
suited  to  digital  computation.  A  fairly  sizable  problem  can  be 
solved  In  500  Instructions,  leaving  the  bulk  of  high-speed 
storage  available  for  tabulation  of  the  functions.  In  this 
Initial  study  the  luxury  of  floating-point  arithmetic  was 
allowed,  due  to  uncertainty  concerning  the  ranges  of  the  vari¬ 
ables.  Considerable  additional  time  and  space  could  be  saved  In 


I 


t 

f 

i 

i 

J 
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The  program  Itself  can  be  divided  into  3  logical  sections. 

A  master  routine  does  the  bookkeeping,  tallying,  and  sequencing; 
a  subroutine  evaluates  P(q^»q2»®i^)  + 

q^»q2»®i#®2'  *  table  of  fj^;  a  second  subroutine  performs  the 

maximization  of  the  above  expression  over  the  Interval  0  ^  s^  ^  1 
The  flow-chart  governing  the  computation  Is  shown  In  Figure 


8. 


In  more  detail,  the  computatlo*  proceeds  as  follows.  Under 
control  of  the  master,  the  constants  Aq^^,  Aq^i  N,  and  m-*-! 
are  Input;  ^q^  and  Aq^  determine  the  density  of  the  grid  over 
which  the  function  f^^  Is  to  be  evaluated;  the  parameter  s^  1» 
Red's  constant  strategy;  N  Is  the  number  of  stages  for  which 
the  process  Is  allowed  to  continue;  and  m+1  determines  the  size 
of  the  grid,  Its  dimensions  being  mAq^  by  mAq^* 

The  quantity  fj^(q^,q2)  le  evaluated  over  the  grid  as 


max  |F(qi#qo»®i)  ♦  f  (qi  »qo'*'0o)  •  Since  the  return 
L  1  ^  o  1  i  J 

from  a  zero— stage  war  le  Identically  zero,  the  maximum  always 

occurs  when  s^  -  0  so  that  f^(q^iq2)  1®  merely  q^^— ( 1— 82 )Q2 * 

This  corresponds  to  the  fact  that  during  the  last  stage  of  a 

war  Blue's  alrpower  will  be  directed  entirely  against  Red's 

ground  forces  . 

The  calculation  of  fj^  with  now  known  Is  not  quite  so 

trivia?  However,  due  to  the  recurrence  relation,  this  calcula¬ 
tion  actually  defines  the  remainder  of  the  program.  Suppose  we 
wish  to  evaluate  f^^  (Q^,Q2^»  where  (Qj^,Q2)  Is  some  (1  A  q^,JAq2) 
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Por  a  particular  we  evaluate  (Q^  +  Q^(Q2^»Q2»®2^* 

Q2  ^  ^  ‘  This  determines  a  point  X  In  the  q^»q2 

plane.  Now  X  falle  within  a  rectangle  of  dimensions  by  Aq^# 
where  Is  known  at  the  four  corners,  and  ^ )  Is  found  by 

linear  Interpolation.  By  adding  (l-S^)Q^  -  (1-82)02 »  ***  determine 
for  s^  -  S^.  We  need  only  to  repeat  this  process 
where  s^^  taket,  on  values  between  0  and  1  to  determine  the  maximum. 
Since  for  the  majority  of  a  process  s^  -  0  or  1,  It  Is  expedient 
first  to  test  for  an  endpoint  maximum  before  searching  the  In¬ 
terior  region.  Since  f^  for  each  R,  R»1,2,...,N,  Is  evaluated 
over  a  grid  of  (m-H)  points.  It  Is  essential  to  optimize  the 
search  process.  Consequently  tne  technique  described  In  [4]  was 
adopted . 

Once  the  function  has  been  evaluated  for  a  fixed  R,  R+1 
replaces  R— 1,  the  newly  calculated  table  f^  replaces  f^^l  In  high¬ 
speed  storage,  and  the  calculation  of  begins. 

Since  Sj^,  the  Blue  strategy  which  maximizes  f.  Is  generally 
of  more  Interest  than  the  resulting  payoff,  f,  a  table  of 
associated  with  each  fp  Is  stored  and  punched  out  prior  to  the 
computation  of 

When  R  reaches  N,  assumed  length  of  the  conflict,  the  fol¬ 
lowing  Information  has  been  obtained: 

1)  The  return  attainable  by  Blue  in  an  N— stage  war,  where 
Blue  enters  the  conflict  with  q^^  planes,  Red  with  q2.  Red  usee 
tne  fixed  allocation  between  air  and  ground  support  and  Blue 

uses  an  optimal  allocation.  This  by  definition  Is  f^.(qj^,q2). 
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2)  Blue  optimal  strategy  during  the  first  stage  of  the 
N— stage  Mar.  This  Is  the  s^  which  maximizes  f^^. 

3)  Blue  optimal  strategy  during  the  first  stage  of  a  war 
of  duration  R,  R«l,2, . . .  ,N— 1 ,  for  any  Initial  forces  and 
These  tables  are  punched  during  the  stage-by-stage  calculation. 

The  process  determining  explicitly  an  optimal  policy  Is 
essentially  that  described  above,  but  In  reverse.  Knowing  R, 
the  stage,  and  (Q2^»q2)i  refer  to  the  table  of  Sj^'s  to  deter¬ 
mine  the  strategy  associated  with  (q2fQ2)‘  Employing  this  strategy 
we  find  ourselves  at  the  (R-fl)*^  stage  and  Blue  possesses 
planes  against  Red’s  Furthermore,  evaluation  of  (l-Sj^)qj^ 

+  (1-82^^2  produces  the  payoff  during  the  (R)^^  time  Interval. 

By  referring  to  the  tables  of  optimal  strategies  for  the  (R-fl)"^ 
stage  and  Initial  forces  and  q2+C2»  we  determine  the  optimal 

allocation  for  the  (R+l)°^  period.  The  process  Is  one  of  re¬ 
peatedly  determining  the  Initial  strategy  In  wars  of  decreasing 
length  and  decreasing  forces.  The  sequence  defines  Blue's 
optimal  policy. 

At  this  time,  only  the  most  essential  computational  varia¬ 
tions  have  been  investigated.  For  example.  It  was  found  that 
due  to  the  near-linear  behavior  of  fp  over  the  range  0  to  1CX)00, 

Aq^  »  Aq^  *  500  gave  sufficiently  accurate  results  to  Justify 
Its  use.  This  same  property  led  to  the  choice  of  linear  In¬ 
terpolation  throughout  the  grid.  Two  versions  of  the  Flbonacclan 
search  method  of  S.  Johnson  [4j  were  considered,  one  using  pre¬ 
determined  points  of  evaluation,  the  other  calculating  the  points. 


Th«  latter,  of  course.  Is  nor#  general,  but  the  advaatagee 
offered  by  less  calculation  and  faster  oeavergenoe  led  to  the 
choice  of  the  fomer.  Printout  of  all  functions  and  strategy 
values  along  the  grid  was  made  optional.  Considerable  time  was 
saved  by  suppression  of  printing  when  s^  equaled  0,  a  frequent 
occurrence  during  the  initial  phases  of  a  caleulation. 

b.  GRAPHS 

Results  are  shown  on  the  following  pages.  Figure  1  shows 
the  changing  relative  strengths  of  the  rival  air  forces  when  Blue 
employs  an  optimal  policy.  Due  to  Blue's  initial  counter-air 
tactics,  Red's  force  is  reduced  during  the  early  stages,  while 
Blue’s  force  drops  suddenly  in  the  later  stages  when  countei>-ground 
strategy  is  used.  Figure  2  shows  that  Blue's  initial  allocation 
of  planes  is  against  the  Red  air  force  if  the  total  number  of  planes 
in  each  force  is  fairly  even.  If  Blue  has  a  marked  numerical 
superiority  or  inferiority,  a  counter-ground  strategy  should  be 
employed.  Figure  ^  depicts  Blue's  strategy  where  Initial  forces 
are  equal.  The  next  three  graphs  show  Blue's  strategy  as  it  changes 
with  time  for  all  initial  conditions.  Figure  7  shows  the  excess 
sorties  flown  by  Blue  as  a  result  of  employing  an  optimal,  rather 
than  constant,  strategy.  The  use  of  an  optimal  policy  in  this 
particular  example  is  shown  to  be  equivalent  to  about  800  planes; 
i.e.,  with  the  given  parameter  values  Blue  can  start  with  800  fewer 
planes  and  still  fly  as  many  sorties  as  Red  during  a  l^tage 
conflict . 

Throughout  these  numerical  examples,  we  have  fixed  both  sides 
replacement  rates,  r^  and  T2>  he  100  planes  per  etage,  non-combat 
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attrltlon  rates,  a,  as  .1,  and  kill  probabilities,  b,  as  .2. 

Figure  8  shows  the  flow  diagram  used  for  computation. 

This  structure  of  the  maxlniilng  strategy  is  typical  of  a 
large  class  of  problems.  The  reason  for  it  lies  in  the  concavity 
of  the  function  appearing  in  (5<6)  as  a  function  of  s^.  This 
concavity  in  turn  i:.  based  upon  the  linearity  of  the  pay-off 
function,  and  the  concavity  of  the  attrition  function.  Two 
conclusions  can  be  drawn  from  this.  In  the  first  place,  it  shows 
that  great  mathematical  simplifications  ensue  when  we  introduce 
concave  functions,  or  convex  functions  if  we  are  minimizing.  Thus 
if  we  have  functions  which  for  one  reason  or  another  are  not 
concave,  it  may  be  well  initially  to  use  concave  approxiiaations 
to  these  functions.  On  the  other  hand,  these  results  show  the 
dangers  inherent  in  mathematical  models.  In  the  real  world,  such 
concentration  on  counteiv-air  or  counteiveurface  at  various  phases 
of  the  campaign  is  dubious.  Catastrophic  loss  by  ground  forces 
might  occur  before  the  time  T  of  the  canqpaign  has  elapsed.  Such 
effects  are  not  measured  by  the  "uniform"  pay-off  function  J. 

There  are,  of  course,  dangers  in  conclusions  based  upon  this 
one-sided  approach  in  which  we  fix  Red’s  strategy.  Iteration 
procedures  may  be  considered  in  which  we  alternately  fix  one  side* 
policy  and  then  the  other's.  These  must  be  used  with  care,  since 
we  know  from  much  simpler  games  that  unless  some  feedback  froai 
stage  to  stage  is  used,  the  results  will  not  converge. 
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Pl»5.  4 — ^Variation  of  Blue 'a  alvatei^  with  time  (N-8  to  l^)  for 
Initial  Blue  force* ana  Initial  Reu  force  qa  between 
0  ana  10,000  ana  for  Red  strategy  »a  •  0.2. 
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